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Introduction 

Random number generation has been an important part of human life for several years and 
the methods that are applied to generate these random numbers have evolved due to the 
development of new uses of random numbers. Random numbers can be generated by physical 
methods such as die rolling (DiCarlo 4). However, the evolution of the practical uses of 
random numbers has spearheaded improvement in the procedures used in producing 


sequences of random numbers. 


In recent times, random numbers have begun to be used in government-run lotteries, video 
games and in modern slot machines. Although the random number generators in some 
programming languages do not produce true randomness they produce sequences of random 
numbers that pass standard statistical tests of randomness that measure the unpredictability of 
the numbers generated. Therefore, since the random numbers are produced using algorithms, 
there must be a way of finding the pattern of production of random numbers and hence make 


the values predictable if the right methods are applied. 


With increasing development in technology, machine learning has evolved as a popular 
method of detecting trends in data. Machine learning has numerous applications such as face 
detection, search engines and weather prediction systems. As such, this investigation seeks to 
determine the extent to which machine learning can be used as a method of predicting the 


outcome of random number generators. 


Furthermore, random numbers are a crucial component of computations in computers and 
since they are used in data encryption algorithms (Haahr), their predictability would imply 
the effortless breaching of security keys by people who can predict the outcome of random 
number generators. This could possibly lead to a widespread compromise of security systems 


which would trigger the breaching of the privacy of individuals and corporations. 
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The aforementioned negative implications of the possible predictability of random number 
generators show that this investigation is relevant to current society. Usually, a random 
number generator is easily predictable when the starting value of the sequence of numbers is 
known. However, this investigation seeks predict the outcome of a random number generator 


using machine learning even when the initial value is unknown. 
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Background Information 
A random number is a number that forms part of a sequence in which the values are spread 
uniformly over a defined interval and where there is no possibility of predicting future values 


based on past or present values (Rouse). 


According to the Merriam-Webster dictionary, to predict is defined as “to declare or indicate 


in advance; especially: foretell on the basis of observation, experience, or scientific reason.” 


Machine Learning 

Machine Learning is the study of algorithms that are designed to make computers perform 
functions without human assistance or being explicitly programmed to do so (Stanford 
University). Machine learning algorithms are made in order to develop applications that can 
extend their functionality based on example datasets provided to them (Schapire 1). Machine 
learning forms a crucial part of artificial intelligence and hence is applied in systems related 
to intelligence such as language or vision. Some other examples of applications of machine 
learning are face detection programs which find faces in images, spam filtering software that 
identifies whether a message is spam or not, weather prediction software and search engines 


(Schapire 1-2). 


Types of Machine Learning Algorithms 

The various machine learning algorithms are categorized into two main sets namely, 
supervised and unsupervised learning. The distinction between these two groups is that 
supervised learning deals with problems where a data set is provided and the correct output is 
known whereas in unsupervised learning, there is no knowledge or idea of what the result of 


the computation should look like (Stanford University). Supervised learning algorithms are 
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further grouped into two groups based on the kinds of problems they solve - classification 


problems and regression problems. 


Classification problems are problems where the given sets of data have to be grouped into 
defined classifications. The algorithms that deal with classification problems examine the 
data to detect similarities in the data that will be used in grouping the data (Stanford 
University). Examples of such machine learning algorithms are the decision tree, the Support, 


the Naive Bayes algorithm and the Logistic regression algorithm. (Ray) 


On the other hand, the algorithms that are tailored to solve regression problems do not group 
the given datasets but rather predict results within a continuous output. This is done by 
establishing a relationship between the input variables and a continuous function (Ray). An 
example of this is using numerous examples of the land area of houses and their 
corresponding prices to strike the relationship between these two variables. Then the 
established relationship is used to determine the price of a house when given its land area 
(Stanford University). This investigation aims to see how far the outcome of a random 
number generator can be predicted using a machine learning algorithm. The reason why the 
linear regression machine learning algorithm was chosen to perform the experiment is 
because this investigation falls in the category of supervised learning since it involves the 


prediction of output values based on some input values. 


Linear Regression machine learning Algorithm 
The Linear Regression machine learning algorithm functions by establishing a general 
relationship some between independent and dependent variables (Ray). The relationship 


established is represented by a best fit line which is represented by the linear equation: 


Y=mxX+C 
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Where Y = dependent variable, X = the independent variable, m = the slope of the line and b 


= y-intercept of the line 


The coefficients m and b are determined by implementing a cost function, which will be 
explained in more detail in the next section. In order to better understand Linear Regression, 
consider being asked to arrange five books with different sizes in a library according to their 
weights without using a weighing scale. The most intuitive way to do this would probably be 
to consider the size of each book, what material it is made up of and possibly it is a hard 
cover or soft cover book and estimate their weights based on these observations. In the same 
way the observer establishes a relationship between the features of the book (independent 
variables) and its weight (dependent variable), the Linear Regression algorithm establishes a 


relationship between a dependent and an independent variable in the form of a best fit line 


(Ray). 


There are two major kinds of linear regression: the Simple Linear Regression, in which the 
dependent variable depends on only one independent variable, and the Multiple Linear 
Regression in which the dependent variable depends on two or more independent variables 
(Ray). Due to the complex nature of this investigation, it was necessary to implement the 


Multiple Linear Regression. 


Cost function and Optimization Algorithms 

The success of linear regression machine learning algorithms depends on their ability to find 
values of parameters of a function that minimize a cost function (Brownlee). This takes place 
in a process called training the algorithm where the linear regression algorithm basically 
learns the pattern of the datasets used for training and calculates values called theta (0) values 
with which it can predict the outcome of other datasets that were not used to train the 


algorithm. During the process of training, the algorithm constantly computes its prediction 
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(hypothesis) using values of parameters. The algorithm then compares its hypothesis with the 
actual outcome for each example given to it. The purpose of the cost function during this 
process is to compute values of parameters which minimize the margin of error between the 
hypothesis and the actual value for each repetition. These values are conventionally 


represented with 0. (Stanford University) 


The component of a machine learning algorithm that finds the theta (0) values is called an 
optimization algorithm. The role of the optimization algorithm is integral to the machine 
learning algorithm in that it computes the theta (0) values which are used to calculate the 
predicted output. For this investigation, I applied two different optimization algorithms 


namely, gradient descent and normal equation. 


Gradient Descent Optimization Algorithm 

The intuition behind gradient descent is easily understood when the training dataset is 
imagined to be a large bowl which represents the cost function (Brownlee). The bottom of the 
bowl is considered to be the part with the lowest cost i.e. the part of the cost function that has 
parameters which yield the least marginal difference between the predicted output value and 
the actual output value. The function of gradient descent, therefore, is to locate the bottom of 
the bowl from any starting position (Brownlee). It achieves this by computing the gradient of 
its current position and then updating the value of 0 such that it moves to the next position 
where the gradient is closer to the minimum point of the bowl — which is a representation of 
the cost function. This is done by moving in steps — or paces - whose magnitude is set by the 
person who does the training of the algorithm. The gradient descent algorithm repeats this 


process until there is convergence i.e. until the bottom of the bowl is reached. 
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Figure 1: The Gradient descent algorithm 
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Figure 2: Gradient Descent Intuition 
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Normal Equation Optimization Algorithm 

On the other hand, the normal equation algorithm calculates theta (0) analytically through a 
matrix multiplication. This is done by grouping the dataset into two matrices, one which 
contains only the output data and the other which contains the variables that relate to each 
output. Using an example of the features of a house and its price, the output matrix will 
contain only the prices of all the houses and the variable matrix will contain characteristics of 
the house that contribute to the determination of the price of the house such as number of 
bedrooms, number of bathrooms etc. (Stanford University). The two matrices are then 
multiplied with an arrangement such that the product of the multiplication will be a one by 
one (1x1) matrix. This matrix contains the theta (0) values for each set of features for each 
house. These theta (0) values are then used to predict the output values for other sets of data 
which may not have been used to train the linear regression algorithm. Figure 3 explains how 
the value of theta is computed using the normal equation algorithm. This optimization 
algorithm typically takes a shorter time to implement since it is an analytical approach. 
However, the normal equation algorithm will not work properly for large datasets (about 


150,000 independent variables and over) (Stanford University). 
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Figure 3: Normal Equation Intuition 
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Random Number Generator 

According to techopedia.com, “A random number generator (RNG) is a mathematical 
construct, either computational or as a hardware device, that is designed to generate a random 
set of numbers that should not display any distinguishable patterns in their appearance or 


generation, hence the word random." 


As the definition suggests, a random number generator is expected to produce a sequence of 
numbers that do not display any perceptible patterns or trends within the sequence. 
Nevertheless there are some types of random number generators that display numbers that 
have an apparent but synthetic randomness and there are other types that generate numbers 
with true randomness (DiCarlo 6-7). These random number generators are classified into two 
main groups: the Pseudo Random Number Generators (PRNGs) the True Random Number 


Generators (TRNGs). 
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Pseudo Random Number Generators (PRNGs) 

Pseudo random number generators are those that use algorithms that contain mathematical 
formulae or pre-calculated tables to generate sequences of random number values (Haahr). 
They generate the sequence of random values by feeding an algorithm with an initial value 
called the seed value. Some PRNGs allow the seed value to be given either by the user or by 
the system itself (ruby-doc.org). The algorithm uses the seed value to generate a number 
which then becomes the seed value for the next computation and this process repeats itself to 


generate a sequence of seemingly random values (Khan Academy). 


Consequentially, pseudo random number generators are not truly random as the name 
“pseudo” already implies. The values they produce do appear random but are in fact 
predetermined by the formula or the pre-calculated table of values being used by the 
generator (Haahr). This characteristic also means that pseudo random number generators are 
deterministic in the sense that a given sequence random numbers can be regenerated once the 
seed value of the sequence is given (Haahr). Furthermore, they are periodic meaning that for 
a number of iterations of producing random number values, the sequence will repeat itself. 
This may not be a desirable quality to many users of pseudo random numbers but most 


modern PRNGs possess a period that is long enough be applied for practical uses. (Haahr) 


Pseudo random number generators are highly efficient due to the fact that they can produce a 
large number of random values within a short period of time. This useful quality warrants 
their application in simulations and modelling software (Haahr). Whereas PRNGs are useful 
in the aforementioned applications, they are not advised for software that deals with data 


encryption and gambling due to their predictable nature. 
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True Random Number Generators (TRNGs) 

True random number generators are those that produce sequences of values that have 
authentic random characteristics. They achieve this by extracting the randomness from 
physical phenomena that are random in themselves. For example, using data obtained from a 
radioactive source or atmospheric noise (Haahr). Unlike the PRNGs, the TRNGs are 
inefficient due to the fact that they take a long time to produce values. They are also non- 
deterministic and do not have any periods which means that the sequence of random numbers 
produced does not repeat itself after a number of iterations. Examples of TRNGs are HotBits 
service at Fourmilab in Switzerland and the lavarand generator built by Silicon Graphics 
which is no longer in operation. TRNGs are useful for generation of data encryption keys and 
lotteries and draws where it is crucial that the values used are truly random and cannot be 


predicted by any means. 


Ruby Random Number Generator 

Ruby, like other object-oriented programming languages, such as the Java and Python 
programming languages, utilizes a PRNG to generate sequences of random values. A 
Mersenne Twister generator is a very popular PRNG. Ruby uses a modified Mersenne 
Twister generator which has a period of 21993” — 1. The Ruby PRNG is initialized with either 
a system-generated seed value or one that is provided by the user and is useful for simulations 
and modelling applications just like most PRNGs (ruby-doc.org). Apart from the ease with 
which the ruby programming language allows the user to select the seed for the PRNG, 
another reason why ruby was chosen for this investigation was due to the fact that it makes 
use of a PRNG that is common to numerous higher level programming languages. As such, 
the ruby PRNG stands as a good representative of a standard PRNG which is why it was 


chosen for this investigation. 
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Methodology 

In carrying out the investigation, it was necessary to firstly generate a set of pseudo-random 
numbers with the ruby random number generator (see appendix) and store in a Notepad file 
to make the data accessible to the Linear Regression algorithm —see appendix. The random 
numbers generated were stored in the arrangement of rows and columns. Then, in order to 
train the algorithm, a section of the data was uploaded into the linear regression algorithm. 
After the training process, the remaining section of the data that was not used to train the 
algorithm was used to test the predictability of the algorithm. The results predicted by the 
algorithm were then compared with the actual values in order to calculate the error margin 
between the prediction and actual value. In carrying out this investigation, Komodo IDE was 
used to implement the Ruby code (see appendix) that produced the random numbers using 
Ruby’s random number generator. The linear regression algorithm — with both gradient 
descent and normal equation optimization algorithms — was implemented using the GNU 


Octave which is a programming language used for scientific programming. 


Investigation 

To better understand the method used, consider - for instance - a file with 150 rows of 
random numbers, each with 20 columns. 100 rows of data with 20 columns each would be 
uploaded into the algorithm for training. Then the remaining 50 rows of random numbers 
would be used to test the predicting ability of the algorithm after being trained. The testing 
would be done by inputting a single row of data, which was not used to train the algorithm, 
up until the 19" column into the algorithm. The algorithm will then be tasked to predict the 
20" number of that row. In order to increase the accuracy of my results, the two optimization 
algorithms — gradient descent and normal equation — were implemented in the Linear 


Regression machine learning algorithm. This approach was used for my investigation because 
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it seemed to be an effective way of testing the patterns derived by the algorithm from the 


training set of data. 


Figure 4: Figure showing how experiment was carried out 
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Part of Linear Regression 


Algorithm that loads data from 


notepad file 
ex 
Q e) L 
E- aAa di 2142 ee€eoo E 
EEtestiMulti.m 3 | 
p a 
After |doing so, you should complete this code 
to prddict the price of a 1650 sq-ft, 3 br house. 
%% Load Data 
132!0data = csvread('EEtestdata5.txt'); 
Z33X = data(1:450,1:29); 
134 y = data(1:450, 30); 
135 m = length(y); 
136 Ds 
137% Add intercept term to X 
138 X = [ones(m, 1) X]; Z] 
lline: 1 jcol|1 jencoding: SYSTEM | eol: |CRLF 
Command Window ex 
0.051212 zj 
-0.028321 
-0.013022 
0.032985 = 
0.052771 
0.001865 
0.008198 
0.035574 
-0.002018 
0.051636 


Predicted 30th integer(using normal equations): 
4.547603 


Predicted Value 30" number 


of a row of random numbers 


Procedure in Steps 
1. 500 rows and 30 columns of random integers ranging from 0 to 9 were generated 
using Ruby’s pseudo-random number generator (see appendix). 
2. 50 rows of random numbers, each with 30 columns, were loaded into the linear 
regression algorithm for training. 
3. Both the gradient descent and normal equation normalization optimization algorithms 


were applied in the training of the linear regression model. 
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4. Afterwards, the next fifteen rows from the 50" row of data were used to test the 
predicting ability of the algorithm. This was done by inputting 29 values of random 
numbers in a single row into the algorithm whiles omitting the 30" value, which the 
algorithm had to predict. Afterwards, the 29 values of the next row of random 
numbers was input into the algorithm for the prediction of its 30" value. This process 
was repeated until all the 30" values of all the fifteen rows had been predicted. 

5. Steps two to four were repeated, increasing the number of rows of data loaded into the 
algorithm by 50 with each repetition. With this, the numbers of rows of data used for 
training were 100, 150, 200, 250, 300, 350, 400 and 450 rows of random numbers 


from the same Notepad file. 


Data Presentation 
The data collected from the different numbers of rows used to train the algorithm was 
organized in tables which show for each row, the predicted values, the actual values, the error 


margin of each prediction and the average error margin of prediction for the training set. 
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Table 1: Table Showing the Expected 30 Numbers, their corresponding Predicted 
Values and the Margins of error for training set of 50 rows for both Gradient Descent 
and Normal Equation optimization algorithms 


Training Set = 50 rows | 


Gradient Descent Normal Equation 
Expected | Predicted 


Number m margin Number : margin 


Expected| Predicted 


Row Error Row Error 


5.3647 51 
4.9021 52 
1.7476 53 
5.1434 54 
7.3922 55 
3.1043 56 
7.276 57 
11.6993 58 
0.5847 59 
13.0943 
1.9259 
6.9269 
1.8282 
3.668 


5.4104 
5.1375 
1.6851 
5.3414 
7.5505 
3.4344 
7.2378 
11.8716 
0.562 
13.2164 
2.2103 
6.9468 
1.7478 
3.8963 


He fob llol+} ll 
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Ave Error Ave Error 
Margin Margin 
5.27300667 5.3800467 


Table 2: Table Showing the Expected 30th Number, their corresponding Predicted Value 
and the Margin of error for training set of 100 rows for both Gradient Descent and 
Normal Equation optimization algorithms 


Training Set = 100 rows 
Gradient Descent Normal Equation | 
Expected| Predicted Expected | Predicted 
Row 30th 30th Error Row 30th 30th Error 
Number | Number | Number margin Number | Number | Number | margin 
101 2 1.433 0.567 101 2 1.4332 
102 3.5941 3.4059 102 3.5925 3.4075 
103 2.291 5.709 103 2.301 
104 2.2526 1.2526 104 2.248 1.248 
105 3.6815 5.3185 105 3.6781 
106 4.5726 0.5726 106 4.5742 0.5742 
107 3.781 1.219 107 3.7786 1.2214 
108 3.1458 0.8542 108 3.1415 0.8585 
109 2.4163 3.5837 109 2.4126 3.5874 
110 2.4843 3.5157 110 2.4811 3.5189 
111 3.2783 1.2783 111 3.2811 1.2811 
112 3.1466 1.1466 112 3.1499 1.1499 
6.5979 1.5979 113 6.5955 1.5955 
2.2558 4.7442 
5.7152 2.2848 


2.2565 4.7435 114 
5.7102 2.2898 115 


ce [|o |N|NjOojo|»|[o|s5j|uo|mj|o]|- 
e |-|[uo|N|N |o jo [5jun|5[vim|c[|- 


Ave Error Ave Error 
Margin Margin 
2.47028667 2.4706067 
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Table 3: Table Showing the Expected 30*'^ Number, their corresponding Predicted Value 
and the Margin of error for training set of 150 rows for both Gradient Descent and 
Normal Equation optimization algorithms 

Training Set - 150 rows 


|| Gradient Descent | NormalEqution | | 
Expected| Predicted Expected| Predicted 
Row 30th 30th Error Row 30th 30th Error 
Number | Number | Number margin Number | Number | Number | margin 
4 3.3989 0.6011 4 3.3988 0.6012 
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3.16711333 3.16716 
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Table 4: Table Showing the Expected 30t Number, their corresponding Predicted Value 
and the Margin of error for training set of 200 rows for both Gradient Descent and 
Normal Equation optimization algorithms 


Training Set - 200 rows 
Gradient Descent Normal Equation 
Expected| Predicted Expected | Predicted 
30th 30th Error Row 30th 30th Error 
Number | Number margin Number | Number | Number | margin 
1 5.3734 | 43734 
5.1083 1.8917 
3.6721 1.3279 
5.3651 0.3651 
5.4252 0.4252 


5.8731 5.8731 206 5.8729 5.8729 


| 
7 
5 
5 
5 
| 0 
4.665 1.665 3 
4.9658 1.0342 6 
5.7139 2.7139 3 
4.7467 4.7467 0 
p 0 


5.4851 5.4851 211 5.4845 5.4845 


7.1385 1.8615 
6.1785 2.1785 
4.0043 4.9957 
4.579 3.421 
Ave Error 


7 
5 
5 
5 
0 
3 
6 
3 
0 
0 
9 
4 
9 
8 


Ave Error 
Margin 
2.8237733 


Margin 
2.82386667 
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Table 5: Table Showing the Expected 30*'^ Number, their corresponding Predicted Value 
and the Margin of error for training set of 250 rows for both Gradient Descent and 
Normal Equation optimization algorithms 


L Training Set = 250 rows | 
Gradient Descent Normal Equation 
Expected| Predicted Predicted 
Row 30th 


Number i Number 


wwo 


0.2549 
0.0905 
3.4474 
2.2585 
5.8727 
4.7457 
4.5899 
0.6141 
4.1301 
Ave Error Ave Error 
Margin Margin 


2.6147267 


NID/OINIWIN/O;S/S 


Table 6: Table Showing the Expected 30‘ Number, their corresponding Predicted Value 
and the Margin of error for training set of 300 rows for both Gradient Descent and 
Normal Equation optimization algorithms 


Training Set = 300 rows | 
Gradient Descent Normal Equation 
Expected| Predicted Expected | Predicted 
Row 30th 30th Error Row 30th 30th 
Number | Number | Number margin Number | Number | Number |Error margin 

301 5 3.899 1.101 301 5 3.899 1.101 
302 6 2.2867 

303 6 3.7849 

304 5 4.7642 

305 2 7.2896 

306 0 4.3489 

307 0 4.4517 

308 2 4.8123 

309 0 5.5426 

310 5 4.9231 

311 3 5.758 

312 3 2.7208 

313 8 4.1423 

314 6 5.2332 

315 6 3.8584 

Ave Error Ave Error 
Margin Margin 
2.63936667 2.639353333 
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Table 7: Table Showing the Expected 30th Number, their corresponding Predicted Value 
and the Margin of error for training set of 350 rows for both Gradient Descent and 
Normal Equation optimization algorithms 


Training Set = 350 rows 


| Gradient Descent | | Normal Equation 
Expected| Predicted Expected | Predicted 
Row 30th 30th Row 30th 30th 
Number | Number | Number Number | Number | Number Error margin 
351 6 0.4193 
352 
353 4 0.5646 
354 
355 | 9 | 3539 | 5.4607 | 
356 7 2.403 
357 | 3 | 4405 | 1405 | | 357 | 3 | 4405 | 14045 | 
358 6 1.1553 
359 
400 
401 


402 4.0256 
403 
404 
405 

Ave Error 


Ave Error 
Margin Margin 


|] | dsa d 1.93524 


Table 8: Table Showing the Expected 30*'^ Number, their corresponding Predicted Value 
and the Margin of error for training set of 400 rows for both Gradient Descent and 
Normal Equation optimization algorithms 


| co 


0.1232 
3.6067 
4.1087 


M 
M 


o 
o 


Training Set = 400 rows 


Gradient Descent Normal Equation 
Expected | Predicted Expected | Predicted 
Row 30th 30th Error 
Number | Number | Number margin Error margin 


401 s | 41837 | 08163 | —— | 401 | 5 | 41837 | 0.8163 | 


402 3 | 34527 | 045277 |  — [| 40 | 3 | 3457 | 04527 | 
403 7 | S147 | 1821 | —— [| 40 | 7 | 5147 | 18521 | 
404 9 | 35131 | 5489 | | 40 | 9 | 35131 | 54869 | 
405 o | 476566 | 47656 | | 405 | o | 47656 | 47656 | 
406 5 | 40539 | 0941 | [| 406 | 5 | 4.0539 | 0941 | 
407 s | 5209 | 0209 | [| 47 | 5 | 5209 | 0.2019 | 
408 1 | 28841 | 1884 | | 408 | 1 | 2884 | 18841 | 
409 4 | 3937 | 003 | | 409 | 4 | 3937 | 0.0623 | 
410 6 | 5939 | oov1 | &— 1| 410 | 6 | 5.9239 | 0071 | 
411 4 | 4406 | 0406 | | amn | 4 | 4.4036 | 0.4036 | 
412 4 412 A 0.3616 
413 5 413 : 0.6847 
414 0 414 i 3.4709 
415 2 415 d 2.6702 
Ave Error 
Margin 


|  ]260066| 1. | | | 160900667 | 
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Table 9: Table Showing the Expected 30*'^ Number, their corresponding Predicted Value 
and the Margin of error for training set of 450 rows for both Gradient Descent and 
Normal Equation optimization algorithms 


| Training Set = 450 rows L | 
Gradient Descent Normal Equation 

Expected| Predicted Expected | Predicted 

Row 30th 30th Row 30th 30th 
Number | Number | Number i Number | Number | Number |Error margin 

451 1 3.7478 À 451 1 3.7478 2.7478 

452 6 5.4145 452 6 5.4145 0.5855 
453 
454 
455 
456 
457 
458 
459 
460 
461 
462 
463 
464 
465 


Ave Error 


Margin 


2.748146667 


Data Analysis 

In analysing the data presented, Microsoft Excel was used to calculate the average error 
margin of predictions of each training set and attempted to find a relationship between the 
error margin of predictions and the number of rows used as the training set for the algorithm. 
I did this as a step towards finding the least possible margin of error in each case. This is 
because finding the least possible margin of error would be a step closer to finding a 
conclusion to the investigation since it seeks to find the extent to which random integers can 


be predicted by using linear regression as a machine learning algorithm. 
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Table 10: Table of Average Error Margins and the Number of Rows of Data used as the 
training set 


Gradient Descent Normal Equation 
used as training sets | Average Error Margin a a of Rows ET Error Margin 
50 5.273006667 | | 5.380046667 | 380046667 
100 2.470286667 pm 470606667 
150 3.167113333 | 3416726 | 16716 


200 2.823866667 — -—— aa 
250 2.64136 |. | — 250 — | 2614726607 | 
300 2.63936667 — | | 300 — | 26393533 
350 1.93524 | | 350 [| 19324 O 
400 160900667 — | | 4o | 160900667 | 
450 2.74814667 | | 450 — | 274814667 | 


The data shown in Table 2 are the combined results of my experimentation with the various 
sizes of datasets used for training and how the average error margin of prediction varied as 
the size of the training set used was increased. The reason for the large number of decimal 
places in the average margin of error values is to achieve as much precision as possible in the 
analysis of the data. A general trend is seen from the table in that, with an increase in the 
number of rows of random number used for training, the average margin of error of the 
predictions decreased. In order to further analyse these results, I represented the data in two 
graphs, one for the results achieved using gradient descent and the other for the results 


achieved using the normal equation approach. 


The two following graphs represent this data in a better way for a trend to be seen in the 


results. 
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Graph 1: A Graph of Average Error Margin against the Number of Rows of Data tested 
using Gradient Descent 


Average Error Margin 
w 


Gradient Descent 


y =-1.15In(x)+ 8.9489 
R? =0.6385 


@ Gradient Descent 
— Log. (Gradient Descent ) 


200 400 600 800 1000 
Number of Rows of Data used as training sets 


Graph 2: A Graph of Average Error Margin against the Number of Rows of Data tested 


using Normal Equation 


un 


Ll 


Average Error Margin 
N w 


Normal Equation 


y = -1.188ln(x) + 9.1608 
R? =0.6406 


* Normal Equation 


— Log. (Normal Equation) 


200 400 600 800 1000 
Number of Rows of Data used as training sets 


The graphs for both the gradient descent and normal equation optimisation algorithms show 


that the relationship between the average error margin and the number of rows of data used as 


the training set is logarithmic — which is not quite helpful for a clear analysis to be done. In 
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order to better analyse the relationship from these graphs, they were converted into a straight- 
line graphs by plotting the Ln (Ave. error margin) against the number of rows used as the 


training set. The following graphs demonstrate this. 


Graph 3: A Graph of Ln (Average Error Margin) against the Number of Rows of Data 
tested using Gradient Descent 


Gradient Descent (straight) 

18 
£16 |= 
wo 
© 14 
= 1.2 
9 1 mn : 
z 23 ] E Gradient Descent 
g O. (straight) 
9 o6 a 
9 B —— Linear (Gradient 
& m Descent (straight)) 
T 0.2 
=p y = -0.0016x + 1.3951 

0 100 200 300 400 500 
Number of rows of data used as training set 


Graph 4: A Graph of Ln (Average Error Margin) against the Number of Rows of Data 
tested using Normal Equation 


Normal Equation (straight) 
1.8 
- 1.6 > 
Pi4 
2 1.2 
£ 1 + + € Normal Equation 
g 0.8 (straight) 
Los * : 
+ Linear (Normal 
F1 0.4 Equation (straight)) 
p y = -0.0017x + 1.4029 
0 100 200 300 400 500 
Number of rows of data used as training set 
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The straight line graph shown by the results achieved by Gradient Descent has the equation, 
y = —0.0016x + 1.3951 

That of Normal Equation has the equation 
y = —0.0017x + 1.4029 


From these equations, a relationship between the average error margin between predictions of 
the random numbers and their corresponding expected values, and the number of rows of data 


used to train the linear regression algorithm can be derived. 


Equation derived from the equation of the graph for Gradient Descent: 


In(Ave. Error Margin) = —0.0016(x) + 1.3951 
Ave. Error Margin = e 90016x*13951 
Where x = the number of rows used as a training set 


Equation derived from the equation of the graph for Norman Equation: 


In(Ave. Error Margin) = —0.0017(x) + 1.4029 
Ave. Error Margin = e 90017x*1.4029 


Where x = the number of rows used as a training set 


The minimum values of these two equations are 1.9752 and 1.90373 respectively. 
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These findings suggest that the least average margin of error that can be achieved by the 
linear regression machine learning algorithm in attempting to predict the outcome of the ruby 
random integer number generator is approximately 2. This value represents the extent to 
which the random integer can be predicted by the linear regression algorithm and therefore 
answers the research question. This value suggests that although the prediction of the random 
integer may not be the exact value of the actual outcome of the random number generator, it 
is quite close to it. An interesting factor to take notice of is the fact that the average margin of 
error decreased significantly from approximately 5 to approximately 2 as the number of rows 


of data used for training was increased from 50 rows to 450 rows. 


Implications of Findings 

The findings of this investigation imply that machine learning could be a plausible way of 
predicting the outcomes of pseudo-random number generators, given a sufficient amount of 
data for training. This could suggest the reason why systems that demand high security such 
as gambling systems and encryption algorithms do not use of PRNGs. Conversely, the 
findings also suggest that the linear regression machine learning algorithm allows enough 
accuracy to be applied in fields that require less security. One such field is the prediction of 
grades in IB. As seen in anonymous data provided by the school showing past students’ 
predicted grades and actual grades, the average margin of error calculated was 2. As a similar 
margin of error was produced by the linear regression model, from the investigations 
findings, it suggests that one useful application of the linear regression machine learning 
algorithm could be that of the prediction of students’ grades. In this instance, the linear 
regression algorithm could be trained with data of the grades of graduated students from their 
IB1 first semester exam to their final IB exam. After the linear regression algorithm is 


trained, it can then be fed with the grades of current students from their IB1 first semester 
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exam to the last exam they write before the final exam. The algorithm can then be made to 
predict the final IB grade of the student based on the trends found from the training sets. The 
prediction of IB grades using the linear regression algorithm is also considered to be a 
plausible idea due to the small range of possible IB grades — 1 to 7. Due to this small range, 
the pattern of the grades achieved by an IB student might not be hard to find as opposed to 


having a larger range to work with. 


Limitations of Investigation 


The limitations of the investigation include the following: 


1. The range of numbers generated by the pseudo-random number generator was small. 
The data experimented on consisted of random integers ranging from 0 to 9. This is a 
limitation as only single digit integers were considered whereas in practice, multi- 
digit random integers are often used. This is due to the fact that a higher range of 
digits would make the pattern of random integers generated harder to find, thereby, 
increasing their secure nature. 

2. Another limitation of the investigation stems from the fact that only random integers 
were considered during the experimentation process. This investigation did not 
consider the predictability of random decimal numbers. Typically, random decimal 
numbers would render the pattern of random numbers more difficult to predict since 


there will be more permutations of figures available to form the pattern. 


Conclusion 
The investigation sought to find the extent to which the linear regression machine learning 
algorithm can predict the outcome of the random integer generator in the ruby programming 


language. At the end of the process of experimentation and analysis of results, it can be 
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concluded that linear regression as a machine learning algorithm can be used to predict 
outcomes of the random integer generator with an average margin of error of 2. This 
conclusion stands as reliable despite all the limitations of this investigation since the linear 
regression algorithm can be modified to predict the outcomes of random sequences with more 
complex patterns. This investigation only serves as a demonstration of the vast possibilities of 


prediction using linear regression as a machine learning algorithm. 
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Appendix 


Ruby Code Used to generate Random integers 


#!/ysr/bin/eny ruby 
seed - 10 
count = @ 
while count «500 
prng - Random.new(seed) 
30.times[print"s([prng.rand(19)], "} 
seed 4-1 
count 4-1 
puts "" 
end 


Source: Author 
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Linear Regression Algorithm (On Octave) used to train and test data values 


clear ; close all; clc 


EJ 

data = csvread('EEtestdata5.txt'); 
X = data(1:450,1:29); 

y data(1:450, 30); 

m length (y) ; 


% Scale features and set them to zero mean 
fprintf('Normalizing Features ...\n"'); 


[X mu sigma] = EEfeatureNormalize (X); 


% Add intercept term to X 
X = [ones(m, 1) X]; 


fprintf('Running gradient descent ...\n'); 


alpha - 0.001; 
num iters - 20000; 


% Init Theta and Run Gradient Descent 
theta - zeros(30, 1); 
[theta, J history] - EEgradientDescentMulti(X, y, theta, alpha, num iters); 


% Plot the convergence graph 

figure; 

plot (1:numel(J_ history), J history, '-b', 'LineWidth', 2); 
xlabel('Number of iterations'); 

ylabel('Cost J'); 


% Calculate the parameters from the normal equation 
theta = EEnormalEquation(X, y); 


% Display normal equation's result 

fprintf('Theta computed from the normal equations: Mn'); 
fprintf(' 5* \n', theta); 

fprintf('Mn'); 


$ Estimate the 30th number of random sequence 
A — data(R,1:29); 
A = [ones(i,1) A] 
thirtieth number - A *theta; 
fprintf(['Predicted 30th integer' ... 
"(using normal equations):\n %f\n'], thirtieth number); 


Adapted from (Stanford University) 


31 


Computer Science Extended Essay 


Software Used 
Software Used to generate random numbers: Komodo IDE 11 


Software used to implement linear regression algorithm: GNU Octave 
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